{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# COMPSCI 389: Homework 0\n",
    "\n",
    "**Assigned**: January 29, 2026. **Due**: February 5, 2026 at 1:00pm Eastern. **Note**: Submissions received after 1:00pm Eastern on February 12, 2026 will receive no credit.\n",
    "\n",
    "**Submitting**: Upload your submission on Gradescope as a `.pdf`. Converting to a PDF can be a complicated process, and so we encourage you to test this process well in advance of the submission deadlines. We recommend converting to HTML, opening the HTML file in a browser, and then printing or exporting to a PDF from your browser. We do not recommend directly converting to a PDF, since this requires installing xelatex. To convert to HTML in VSCode, press `ctrl+shift+p` and type `export`, and you should see an option to export to HTML.\n",
    "\n",
    "**Note**: Keep your `.ipynb` file, as we may request it directly (via email).\n",
    "\n",
    "**Note**: When converting to a PDF file, ensure that all of your code cells have been executed. The results of these executions *must* be included in your submitted PDF."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instructions\n",
    "\n",
    "Complete the questions below, replacing the <font color=\"blue\">blue</font> text with your own answers (your answers do not need to remain in blue). Do **not** modify the <font color=\"green\">green</font> text. Try to answer the questions without consulting your notes or any online material. If you cannot, then consult your notes, and if absolutely necessary, consult course materials (slides, notebooks) and/or Wikipedia. Do **not** use other sources or tools like ChatGPT. Complete this part of the assignment on your own (do **not** work with others).\n",
    "\n",
    "After you have completed all of the questions, at the bottom of this assignment you will find a link to another notebook, `Homework 0 Solutions.ipynb`. This contains the solutions, and instructions for ensuring that your answers are correct and sufficient. Make another pass through your homework assignment, replacing the <font color=\"green\">green</font> text with descriptions of what you missed for each question, and providing the fixes necessary to make your answer correct. **The solutions file may include additional instructions, which may include additional content to respond to even if you got a question correct (e.g., additional reflection).** During this second stage where you are filling in your answers, replacing the <font color=\"green\">green</font> text, you may reference the solutions, work with others, and use any tools (including ChatGPT).\n",
    "\n",
    "You will only submit this assignment once after replacing both the blue and green text. You do not need to submit the assignment between the first and second passes. Grading for each question will be based on whether you followed this process, and arrived at the correct answers and have sufficient discussion/text in the end. Points will be deducted if you did not make a reasonable effort to answer the question initially, if your final answer remains incorrect, of if your answers were not sufficiently clear (so, write in full sentences with proper punctuation, and conveying your arguments clearly). Other than verifying that you made a reasonable initial effort for your initial answers (<font color=\"blue\">blue</font>), points will **not** be deducted due to *initial* answers being incorrect. Hence, there is no reason to break the rules to obtain correct answers initially."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1: Short Answer\n",
    "\n",
    "Answer the following questions with at least a few sentences, and no more than roughly one page of text."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. [4 points] Read the following paragraph. Type your name in the answer section to sign indicating that you have read and understand this statement. If you have any questions about this statement, email Prof. Thomas at pthomas@umass.edu.\n",
    "\n",
    "```\n",
    "I understand the late assignment policy described in the syllabus and reviewed during the first lecture. I recognize that, although the policy is lenient, its enforcement will be strict. For example, if I leave the assignment until the deadline for late submissions (e.g., 1pm February 12, for this assignment), and I miss the deadline by a single second, I will get a zero on the assignment. I understand that reasons for submitting after the late deadline like \"Gradescope was down\", \"I have the .ipynb file, but can't get it to print to a PDF\", or \"I was sick after the assignment was due, but before the late submission deadline\" will not result in my receiving a further extension. This is because the assignment was due on the stated due date, and the 7 days given after that date were precisely to cover these kinds of situations.\n",
    "```\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this with your typed name, as a signature indicating that you read and understand the above statement..</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">No additional response is required for this question. You do not need to change this green text.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. [4 points] Read the following paragraph. Type your name in the answer section to sign indicating that you have read and understand this statement. If you have any questions about this statement, email Prof. Thomas at pthomas@umass.edu.\n",
    "\n",
    "```\n",
    "I understand that submissions emailed to the instructor or TA will not be accepted. Assignments must be submitted through Gradescope. The 7 late days given are meant to provide time to resolve any submission issues I may have.\n",
    "```\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this with your typed name, as a signature indicating that you read and understand the above statement..</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">No additional response is required for this question. You do not need to change this green text.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. [4 points] Read the following paragraph. Type your name in the answer section to sign indicating that you have read and understand this statement. If you have any questions about this statement, email Prof. Thomas at pthomas@umass.edu.\n",
    "\n",
    "```\n",
    "I understand that homework assignments are typically due at 1pm (the start time for lecture), not midnight. I also understand that the late-deadline is typically also 1pm (not midnight). I also understand that the solution file may ask for additional work and answers in the green text fields that is not mentioned in the assignment document.\n",
    "```\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this with your typed name, as a signature indicating that you read and understand the above statement..</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. [4 points] Read the following paragraph. Type your name in the answer section to sign indicating that you have read and understand this statement. If you have any questions about this statement, email Prof. Thomas at pthomas@umass.edu.\n",
    "\n",
    "```\n",
    "I understand that regrade requests must be submitted (via Gradescope) within one week of assignments being graded. If I submit a regrade request after that, it will not be considered. Of particular note, at the end of the semester if my grade is a fraction of a percentage from being a higher letter grade, and I ask for an assignment from earlier in the semester to be regraded, it will not be regraded.\n",
    "```\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this with your typed name, as a signature indicating that you read and understand the above statement..</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">No additional response is required for this question. You do not need to change this green text.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. [10 points] What is the definition of machine learning?\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6. [10 points] Give an example of an algorithm (hypothetical or real) that falls within the field of *artificial intelligence* (AI) but **not** the field of *machine learning* (ML). Explain why this algorithm is in an AI algorithm but not an ML algorithm. Do not use the example from the first lecture (this example is in the slides for the first lecture if you missed class that day).\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2: Programming\n",
    "\n",
    "The purpose of this question is to verify that you have properly installed Python and the required packages. You should not (yet) try to understand what this code is doing - just verify that it runs and produces the expected output."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7. [10 points] Run the code block below. If you get errors, you may use any tools (e.g., ChatGPT) to help you resolve them. For example, this error:\n",
    "\n",
    "> ModuleNotFoundError: No module named 'numpy'\n",
    "\n",
    "indicates that the numpy package was not installed. This can be fixed by running `pip install numpy`. The script below relies on pandas, scikit-learn, numpy, and matplotlib. Visual Studio Code may also ask to install the ipykernel package.\n",
    "\n",
    "The grading for this question will be contingent on your submitted PDF displaying the expected output of the program below. Verify that the printed PDF includes the output of the Python cell below. If you submit a PDF that does not show the cell below having executed properly, you will get a zero for this problem. Note that this same policy will be applied for future assignments - always verify that the output of your code is in the submitted PDF.\n",
    "\n",
    "The expected output is three lines of text similar to:\n",
    "```\n",
    "Iteration 0/2, Loss: 8.4242\n",
    "Iteration 1/2, Loss: 145.8458\n",
    "Iteration 2/2, Loss: 5168.6650\n",
    "```\n",
    "followed by a line plot with title \"Gradient Descent Loss, Polynomial Degree: 2\", followed by two more lines similar to:\n",
    "```\n",
    "Test MSE: 5150.7023\n",
    "Standard Error of MSE: 92.1197\n",
    "```\n",
    "It is not a concern if the numbers in the printed lines differ slightly. The first time you run this, it will download the data set. On my computer (with a wired connection) this cell took 10.9 seconds to run the first time. Subsequent runs (without restarting the Python kernel) will not re-download the data set. On my machine these subsequent runs take 0.2 seconds. This timing information should help you to identify if your program is hanging for some reason (e.g., if it runs for more than a couple minutes the first time, something is likely wrong)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.base import BaseEstimator\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "from sklearn.preprocessing import PolynomialFeatures, StandardScaler\n",
    "\n",
    "def mean_squared_error(predictions, labels):\n",
    "    return np.mean((predictions - labels) ** 2)\n",
    "\n",
    "def compute_gradients(X, y, weights):\n",
    "    predictions = X.dot(weights)\n",
    "    errors = predictions - y\n",
    "    return 2 / X.shape[0] * X.T.dot(errors)\n",
    "\n",
    "class PolynomialRegressionGD(BaseEstimator):\n",
    "    def __init__(self, learning_rate, iterations=1000, polynomial_degree=2):\n",
    "        self.learning_rate = learning_rate\n",
    "        self.iterations = iterations\n",
    "        self.polynomial_degree = polynomial_degree\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        self.scaler_ = StandardScaler().fit(X)\n",
    "        X_scaled = self.scaler_.transform(X)\n",
    "        self.poly = PolynomialFeatures(degree=self.polynomial_degree)\n",
    "        X_poly = self.poly.fit_transform(X_scaled)  # Use standardized features\n",
    "        numFeatures = X_poly.shape[1]\n",
    "        self.weights = np.zeros(numFeatures)\n",
    "        self.loss_history = []\n",
    "        predictions = X_poly.dot(self.weights)\n",
    "        loss = mean_squared_error(predictions, y)\n",
    "        print(f\"Iteration 0/{self.iterations}, Loss: {loss:.4f}\")\n",
    "        for i in range(1, self.iterations + 1):\n",
    "            # Compute the gradient of the loss function\n",
    "            gradients = compute_gradients(X_poly, y, self.weights)\n",
    "\n",
    "            # Update the weights using gradient descent\n",
    "            self.weights -= self.learning_rate * gradients\n",
    "\n",
    "            # Compute, print, and store the resulting loss\n",
    "            loss = mean_squared_error(X_poly.dot(self.weights), y)\n",
    "            self.loss_history.append(loss)\n",
    "            print(f\"Iteration {i}/{self.iterations}, Loss: {loss:.4f}\")\n",
    "        return self\n",
    "\n",
    "    def predict(self, X):\n",
    "        X_scaled = self.scaler_.transform(X)\n",
    "        X_poly = self.poly.transform(X_scaled)\n",
    "        return X_poly.dot(self.weights)\n",
    "\n",
    "df = pd.read_csv(\"https://people.cs.umass.edu/~pthomas/courses/COMPSCI_389/GPA.csv\", delimiter=',')\n",
    "X = df.iloc[:, :-1]\n",
    "y = df.iloc[:, -1]\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, shuffle=True)\n",
    "\n",
    "def run(alpha):\n",
    "    iterations = 2\n",
    "    polynomial_degree = 2\n",
    "    model = PolynomialRegressionGD(\n",
    "        learning_rate=alpha,\n",
    "        iterations=iterations,\n",
    "        polynomial_degree=polynomial_degree\n",
    "    )\n",
    "    model.fit(X_train, y_train)\n",
    "    plt.plot(range(1, iterations + 1), model.loss_history)\n",
    "    plt.xlabel('Iterations')\n",
    "    plt.ylabel('Mean Squared Error')\n",
    "    plt.yscale('log')\n",
    "    plt.title(f'Gradient Descent Loss, Polynomial Degree: {polynomial_degree}')\n",
    "    plt.show()\n",
    "    predictions = model.predict(X_test)\n",
    "    mse_test = mean_squared_error(predictions, y_test)\n",
    "    print(f\"Test MSE: {mse_test:.4f}\")\n",
    "    squared_errors = (predictions - y_test) ** 2\n",
    "    std_error = np.std(squared_errors) / np.sqrt(len(squared_errors))\n",
    "    print(f\"Standard Error of MSE: {std_error:.4f}\")\n",
    "\n",
    "# Set the learning rate and run the model\n",
    "alpha = 0.1\n",
    "run(alpha)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The solutions can be found here: [https://people.cs.umass.edu/~pthomas/courses/COMPSCI_389_Spring2026/Homework%200%20Solutions.ipynb](https://people.cs.umass.edu/~pthomas/courses/COMPSCI_389_Spring2026/Homework%200%20Solutions.ipynb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}